a list of everyone in the city we could contact, it would be not be feasible to visit all of them and
measure their SBP. Nor would it be necessary. Using inferential statistics, we could draw a sample
from this population, measure their SBPs, and calculate the mean as a sample statistic. Using this
approach, we could estimate the mean SBP of the population.
But drawing a sample that is representative of the background population depends on probability (as
well as other factors). In the following sections, we explain why samples are valid but imperfect
reflections of the population from which they’re drawn. We also describe the basics of probability
distributions. For a more extensive discussion of sampling, see Chapter 6.
Recognizing that sampling isn’t perfect
As used in epidemiologic research, the terms population and sample can be defined this way:
Population: All individuals in a defined target population. For example, this may be all
individuals in the United States living with a diagnosis of Type II diabetes.
Sample: A subset of the target population actually selected to participate in a study. For example,
this could be patients in the United States living with Type II diabetes who visit a particular clinic
and meet other qualification criteria for the study.
Any sample, no matter how carefully it is selected, is only an imperfect reflection of the population.
This is due to the unavoidable occurrence of random sampling fluctuations called sampling error.
To illustrate sampling error, we obtained a data set containing the number of private and public
airports in each of the United States and the District of Columbia in 2011 from Statista (available at
https://www.statista.com/statistics/185902/us-civil-and-joint-use-airports-
2008/). We started by making a histogram of the entire data set, which would be considered a census
because it contains the entire population of states. A histogram is a visualization to determine the
distribution of numerical data, and is described more extensively in Chapter 9. Here, we briefly
summarize how to read a histogram:
A histogram looks like a bar chart. It is specifically crafted to display a distribution.
The histogram’s y-axis represents the number (or frequency) of individuals in the data that fall in
the numerical ranges (known as classes) of the value being charted, which are listed across the x-
axis. In this case, the y-axis would represent number of states falling in each class.
This histogram’s x-axis represents classes, or numerical ranges of the value being charted, which
is in this case is number of airports.
We first made a histogram of the census, then we took four random samples of 20 states and made a
histogram of each of the samples. Figure 3-1 shows the results.